Digging Deep for Ancient Relics: A Survey of Protein Motifs in the Intergenic Sequences of Four Eukaryotic Genomes Running title: Survey of Protein Pseudomotifs in Intergenic Region

نویسندگان

  • ZhaoLei Zhang
  • Paul Harrison
  • Mark Gerstein
چکیده

We have examined conserved protein motifs in the non-coding, intergenic regions ("pseudomotif patterns") and surveyed their occurrence in the fly, worm, yeast and human genomes (chromosomes 21 and 22 only). To identify these patterns, we masked out annotated genes, pseudogenes and repeat regions from the raw genomic sequence and then compared the remaining sequence, in six-frame translation, against 1319 patterns from the PROSITE database. For each pseudomotif pattern, the absolute number of occurrences is not very informative unless compared against a statistical expectation; consequently, we calculated the expected occurrence of each pattern using a Poisson model and verified this with simulations. Using a p-value cutoff of 0.01, we found 67 pseudomotif patterns over-represented in fly intergenic regions, 34 in worm, 21 in human and 6 in yeast. These include the Zinc finger, leucine zipper, nucleotide-binding motif and EGF domain. Many of the over-represented patterns were common to two or more organisms, but there were a few that were unique to specific ones. Furthermore, we found more over-represented patterns in the fly than in the worm, although fly has fewer pseudogenes. This puzzling observation can be explained by a higher deletion rate in the fly genome. We also surveyed under-represented patterns, finding 23 in the fly, 12 in worm, 18 in human and 2 in yeast. If intergenic sequences were truly random, we would expect an equal number of over and under-represented patterns. The fact that for each organism the number of over-represented patterns is greater than the number of under-represented ones implies that a fraction of the intergenic regions consist of ancient protein fragments that, due to accumulated disablements, have become unrecognizable to conventional techniques for gene and pseudogene identification. Moreover, we find that in aggregate the over-represented pseudomotif patterns occupy a substantial fraction of the intergenic regions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Digging deep for ancient relics: a survey of protein motifs in the intergenic sequences of four eukaryotic genomes.

We have examined conserved protein motifs in the non-coding, intergenic regions ("pseudomotif patterns") and surveyed their occurrence in the fly, worm, yeast and human genomes (chromosomes 21 and 22 only). To identify these patterns, we masked out annotated genes, pseudogenes and repeat regions from the raw genomic sequence and then compared the remaining sequence, in six-frame translation, ag...

متن کامل

Molecular typing of avian Escherichia coli isolates by enterobacterial repetitive intergenic consensus sequences-polymerase chain reaction (ERIC-PCR)

BACKGROUND: Colibacillosis is one of the most economically important diseases of poultry worldwide. OBJECTIVES: This study was conducted to examine the clonal relatedness and typing of 95 avian Escherichia coli isolates by ERIC-PCR. METHODS: Sixty-three E. coli isolates from two common manifestations of colibacillosis (yolk sac infection and colisepticemia) and 32 isolates from feces of apparen...

متن کامل

Phylogenetic Analysis of Some Luffa Genotypes Based on the sequence of intergenic region of trnH-psbA

Luffa (Luffa cylindrica) is a plant from the Cucurbitaceae family that grows mostly in tropical and subtropical regions, as well as in most regions of Iran. In this research, the genetic diversity of nine native and non-native genotypes of L. cylindrica was investigated through the evaluation of the chloroplast trnH-psbA intergenic region (IGS). After sampling the young leaves, DNA extraction w...

متن کامل

Microsatellites in different eukaryotic genomes: survey and analysis.

We examined the abundance of microsatellites with repeated unit lengths of 1-6 base pairs in several eukaryotic taxonomic groups: primates, rodents, other mammals, nonmammalian vertebrates, arthropods, Caenorhabditis elegans, plants, yeast, and other fungi. Distribution of simple sequence repeats was compared between exons, introns, and intergenic regions. Tri- and hexanucleotide repeats prevai...

متن کامل

Enterobacterial Small Mobile Sequences Carry Open Reading Frames and are Found Intragenically—Evolutionary Implications for Formation of New Peptides

Intergenic repeat units of 127-bp (RU-1) and 168-bp (RU-2), as well as a newly-found class of 103-bp (RU-3), represent small mobile sequences in enterobacterial genomes present in multiple intergenic regions. These repeat sequences display similarities to eukaryotic miniature inverted-repeat transposable elements (MITE). The RU mobile elements have not been reported to encode amino acid sequenc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002